head(filter_lessthan_3809)
##           X_id         sire          dam genotyped  pou sex bwt.adjPhen
## 1 3.680232e+12 3.632287e+12 3.632290e+12         0 3680   1       3.188
## 2 3.680232e+12 3.632287e+12 3.632290e+12         0 3680   2      -8.311
## 3 3.680232e+12 3.632287e+12 3.632290e+12         1 3680   2      -3.311
## 4 3.680232e+12 3.632287e+12 3.632290e+12         0 3680   1      -3.812
## 5 3.680232e+12 3.632287e+12 3.632290e+12         0 3680   2      45.689
## 6 3.680232e+12 3.632287e+12 3.632294e+12         0 3680   1       3.188
##   lfi.adjPhen afi.adjPhen wtg.adjPhen hhp.adjPhen
## 1        -999    -999.000    -999.000        -999
## 2        -999    -999.000    -999.000        -999
## 3        -999    -999.000    -999.000        -999
## 4        -999      11.795       7.871        -999
## 5        -999    -999.000    -999.000        -999
## 6        -999    -999.000    -999.000        -999

Introduction: This Phenotype Dataset contains the animals and the pedigree. It includes an indicator whether the animal is genotyped (1) or not (0). In total you will find 5 traits, body weight (BWT), feed intake 1 (LFI), feed intake 2 (AFI), weight gain (WTG) and egg ruction (HHP). To simplify things, it has been pre-corrected for the fixed effects. Missing values are denoted as -999.0.

Column POU represents the groups of selection candidates. When separating animals into training and testing sets, by masking the phenotypes of the testing, All animals have a phenotype for BWT. For evaluation of HHP, then you will need to mask POU 3809 and discard anything in POUs greater than this one. Although the random partitioning of training and testing animals is used, we would rather see a more realistic scenario of training animals having animals born before the testing. Thus our aim in evaluating their counts, correlations and distributions before the actual analysis.

Non-missing

Non-missing

Fig.1:

The Counts for non-missing training sets has been constructed by filering for groups with filter by group numbers less than 3809. The total data set contains 303610 ovservations for each phenotype. The non-missing traits count plot shows the number of non-missing values out of the total observations for our filtered dataset for the 5 traits.

Traits Venn

Traits Venn

Fig.2 :

A venn overlapp is constructed to show the overlapp between non-missing trait observations for our filtered dataset, which in combination with the bar plots shows how Body weight is related to all the traits whereas there exists a disjoint set of relations between 2 groups of the other traits.

1st Feed based on its non-missing observations seems to be correlated to the egg ruction trait, while the 2nd feed non-missing observation is correlated to the weight gain observations.

Fig.3 :

In summarising the observation disjoint set of correlations for each trait with non-missing values, this matrix of plots was generated by pairwise filtering for non-missing values, which shows a pattern similar to the one observed with our venn overlapp, suggesting the high correlation of 2 disjointed sets of 2 traits. The matrix also contains the overall distribution density of each trait’s non-missing observations across all groups.

Correlation Matrix Plots

Correlation Matrix Plots